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Abstract — In this work we give a concise definition of infor- 
mation loss from a system-tlieoretic point of view. Based on tliis 
definition, we analyze the information loss in memoryless input- 
output systems subject to a continuous-valued input. For a certain 
class of multiple-input, multiple-output systems the information 
loss is quantified. An interpretation of this loss is accompanied 
by upper bounds which are simple to evaluate. 

Finally, a class of systems is identified for which the informa- 
tion loss is necessarily infinite. Quantizers and limiters are shown 
to belong to this class. 

I. Introduction 

In the XXXI. Shannon lecture Han argued that information 
', theory links information-theoretic quantities, such as entropy 
and mutual information, to operational quantities such as 
source, channel, capacity, and error probability |[T1. In this 
work we try to make a new link to an operational quantity 
not mentioned by Han: information loss. Information can be 
lost, on the one hand, in erasures or due to superposition of 
noise as it is known from communication theory. Dating back 
to Shannon |2| this loss is linked to the conditional entropy 
of the input given the output, at least in discrete-amplitude, 
memoryless settings. On the other hand, as stated by the data 
processing inequality (DPI, |3|), information can be lost in 

■ deterministic, noiseless systems. It is this kind of loss that we 
" will treat in this work, and we will show that it makes sense 

to link it to the same information-theoretic quantity. 

The information loss in input-output systems is very 
sparsely covered in the literature. Aside from the DPI for dis- 
, Crete random variables (RV) and static systems, some results 

■ are available for jointly stationary stochastic processes H. Yet, 
" all these results just state that information is lost, without 

quantifying this loss. Only in |5| the information lost by 
collapsing states of a discrete-valued stochastic process is 
quantified as the difference between the entropy rates at the 
input and the output of the memoryless system. 

Conversely, energy loss in input-output systems has been 
deeply analyzed, leading to meaningful definitions of transfer 
functions and notions of passivity, stability, and losslessness. 
Essentially, it is our aim to develop a system theory not 
from an energetic, but from an information-theoretic point of 
view. So far we analyzed the information loss of discrete- 
valued stationary stochastic processes in finite-dimensional 
dynamical input-output systems |6|, where we proposed an 
upper bound on the information loss and identified a class 
of information-preserving systems (the information-theoretic 



counterpart to lossless systems). In fT^ the information loss 
of continuous RVs in memoryless systems was quantified and 
bounded in a preliminary way. In this work, extending |7|, we 
analyze the information loss for static multiple-input, multiple- 
output systems which are subject to a continuous input RV. 
Unlike in our previous work, we permit functions which lose 
an infinite amount of information and present the accord- 
ing conditions. Aside from that we provide a link between 
information loss and differential entropy, a quantity which 
is not invariant under changes of variables. The next steps 
towards an information-centered system theory are the analysis 
of discrete-time dynamical systems with continuous-valued 
stationary input processes and a treatment of information loss 
in multirate systems. 

In the remainder of this paper we give a mathematically 
concise definition of information loss (Section |ll]i. After re- 
stricting the class of systems in Section |III1 in Section HVl we 
provide exact results for information loss together with simple 
bounds, and establish a link to differential entropies. Finally, 
in Section [V] we show under which conditions the information 
loss becomes infinite. 

This manuscript is an extended version of a paper submitted 
to a conference. 

II. A Definition of Information Loss 

When talking about the information loss induced by pro- 
cessing of signals, it is of prime importance to accompany this 
discussion by a well-based definition of information loss going 
beyond, but without lacking, intuition. Further, the definition 
shall also allow generalizations to stochastic processes and 
dynamical systems without contradicting previous statements. 
We try to meet both objectives with the following 

Definition 1. Let X be an rM] on the samples space X, and let 
Y be obtained by transforming X. We define the information 
loss induced by this transform as 

i(X^r) = sup(/(l;X)-/(l;y)) (1) 

where the supremum is over all partitions V of X, and where 
X is obtained by quantizing X according to the partition V 
(see Fig. [T]l. 

This Definition is motivated by the data processing in- 
equality (cf. 1 3 1), which states that the expression under the 

'Note that X and all other involved RVs need not be scalar- valued. 
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Fig. 1. Model for computing the information loss of a memoryless input- 
output system g. Q is a quantizer with partition V. 



supremum is always non-negative: Information loss is the 
worst-case reduction of information about X induced by 
transforming X. We now try to shed a little more light on 
Definition [U in the following 

Theorem 1. The information loss of Definition\l\is given by: 



L{X ^ Y) 



lim {l{X]X) 
x^x ^ 

H{X\Y) 



I{X;Y) 



(2) 
(3) 



Proof: We start by noticing that 

I{X;X) - I{X;Y) ^ H{X\Y) < H{X\Y) (4) 

by the definition of mutual information and since both X and 
Y are functions of X. The inequality in (|4]l is due to data 
processing (X is a function of X). We now show that in the 
supremum over all partitions equality can be achieved. 

To this end, observe that among all partitions of the sample 
space of X there is a sequence {Vn} of increasingly fine 
partition^ such that 



lim Xn^X 



(5) 



where X„ is the quantization of X induced by partition Vn- 
By the axioms of entropy (e.g., |8, Ch. 14]), H{Xn\Y) is an 
increasing sequence in n with limit H{X\Y). Thus, this limit 
represents the supremum in Definition [T] which proves (O. 

Note further that each converging sequence X ^ X 
contains a converging subsequence Xn X satisfying (|5]l, 
and where Xn+i is obtained by refining the partition inducing 
Xn- Therefore, 



lim H{X\Y) 
x^x 



Ymi H{Xn\Y)^ H{X\Y) (6) 



which completes the proof. ■ 
This Theorem shows that the supremum in Definition [T] is 
achieved for X = X, i.e., when we compute the difference 
between the self-information of the input and the information 
the output of the system contains about its input. This differ- 
ence was shown to be identical to the conditional entropy of 
the input given the output - the quantity which is also used for 
quantifying the information loss due to noise or erasures (in 
the discrete-valued, memoryless case). In addition to that, the 
Theorem suggests a natural way to measure the information 
loss via measuring mutual informations, as it is depicted in 
Fig. [T] As we will see later (cf. Theorem O, the considered 

^i.e., Vn+i is a refinement of Vn- 



partition does not have to be infinitely fine, but indeed a 
comparably coarse partition can deliver the correct result. 

III. Problem Statement 

Let X — [Xi,X2t - - - ,Xm] be an A^-dimensional RV 
with a probabiUty measure Px absolutely continuous w.r.t. 
the Lebesgue measure /i (Px ^ /^)- We require Px to be 
concentrated on A" C M^. This RV, which possesses a unique 
probability density function (PDF) /x, is the input to the 
following multivariate, vector-valued function: 

Definition 2. Let g: A" 3^, A", >' C M^, be a surjective, 
Borel-measurable function defined in a piecewise manner: 



fgi(x), ifxe-Yi 

?(x) i g2(x), if X e ^"2 



(7) 



where x = [xi,X2t - - - ^xn] and ^i- Xi — > yi bijectiveljU. Fur- 
thermore, let the Jacobian matrix J7g( ) exist on the closures 
of Xi. In addition to that, we require the Jacobian determinant, 
|detj7g(-)|, to be non-zero Px-almost everywhere. 

In accordance with previous work |7] the A", are disjoint 
sets of positive Px-measure which unite to X, i.e., [J^Xi — X 
and XiDXj = 9 if i ^ j. Clearly, also the 3^; unite to y, but 
need not be disjoint. This definition ensures that the preimage 
g^[y] of each element y G 3^ is a countable set. 

Using the method of transformation |8, pp. 244] one 
obtains the PDF of the iV-dimensional output RV Y = 
[Yi,Y2,---,Yn] as 



/Y(y)= E 



xiGg-My] 



/x(x.) 
|det^g(xi 



(8) 



where the sum is over all elements of the preimage. Note 
that since Y possesses a density, the corresponding probability 
measure Py is also absolutely continuous w.r.t. the Lebesgue 
measure. 

IV. Main Results 
We now state our main results: 

Theorem 2. The information loss induced by a function g 
satisfying Definition |2] is given as 



Ex. 



/x(x.) 

eg-i[g(x)] |dctjg(x,)| 



i/(X|Y) - ^ /x(x) log I — ...^n^^^^..y^.n | 



/x(x) 
|dctjg(x) 



(9) 



The proof of this Theorem can be found in the Appendix 
and, in a modified version for univariate functions, in f7^. 
Note that for univariate functions the Jacobian determinant 
is replaced by the derivative of the function. 



'in the univariate case, i.e., for N = 1, this is equivalent to requiring that 
g is piecewise strictly monotone. 



Corollary 1. The information loss induced by a function g 
satisfying Definition |2] is given as 

H{X\Y) = /i(X)-/i(Y)+E{log|detj7g(X)|} (10) 

Proof: The proof is obtained by recognizing the PDF of 

Y inside the logarithm in (|9]i and by splitting the logarithm. 

■ 

This result is particularily interesting because it provides 
a link between information loss and differential entropies 
already anticipated in [8 pp. 660]. There, it was claimed that 

/i(Y) < /i(X) +E{log|detJg(X)|} (11) 

where equality holds iff g is bijective. While (fTTl i is actually 
another version of the DPI, Corollary [T] quantifies how much 
information is lost by processing. In addition to that, a 
very similar expression denoted as folding entropy has been 
presented in ||9l, although in a completely different setting 
analyzing the entropy production of autonomous dynamical 
systems. 

We now introduce a discrete RV W which depends on the 
set Xi from which X was taken. In other words, for all i 
we have W — Wi iff x ^ Xi. One can interpret this RV as 
being generated by a vector quantization of X with a partition 

V = {Xi}. With this new RV we can state 

Theorem 3. The information loss is identical to the uncer- 
tainty about the set Xi from which the input was taken, i.e., 

H{Ji\Y)^H{W\Y). (12) 

The proof follows closely the proof provided in fl\ and 
thus is omitted. However, this equivalence suggests a way of 
measuring information loss by means of proper quantization: 
Since H{W\Y) = /(VF; X) - I{W\Y) the loss can be 
determined by measuring mutual informations, which in this 
case are always finite (or, at least, bounded by H{W)). In 
contrary to that, the mutual information in (|2]l of Theorem [T| 
diverge to infinity; This expression was used in Q for the 
information loss, highlighting the fact that both the self- 
information of X and the information transfer from X to Y 
are infinite. 

The interpretation derived from Theorem |3] allows us now 
to provide upper bounds on the information loss: 

Theorem 4. The information loss is upper bounded by 

H{x\Y) < [ /Y(y)iog|g-My]|rfy (13) 
Jy 

< maxlog|g"^[y]|. (15) 
y 

Proof: We give here only a sketch of the proof: The first 
inequality results from bounding H{W\Y = y) by the entropy 
of a uniform distribution on the preimage of y. Jensen's 
inequality yields the second line of the Theorem. The coarsest 
bound is obtained by replacing the cardinality of the preimage 
by its maximal value. ■ 



In this Theorem, we bounded the information loss given a 
certain output by the cardinality of the preimage. While the 
first bound considers the fact that the cardinality may actually 
depend on the output itself, the last bound incorporates the 
maximum cardinality only. In cases where the function from 
Definition |2] is defined not on a countable but on a finite 
number of subdomains this finite number can act as an upper 
bound (cf. Q). Another straightforward upper bound, which 
is independent from the bounds in Theorem|4]is obtained from 
Theorem [3] by removing conditioning: 

H{X\Y) < H{W) ^ -Y^P^^ogp, (16) 

i 

where pi = Px{Xi) — /x(x)(ix. It has to be noted, 
though, that depending on the function g all these bounds 
can be infinite while the information loss remains finite. 

A further implication of introducing this discrete RV W is 
that it allows us to perform investigations about reconstructing 
the input from the output. Currently, a Fano-type inequality 
bounding the reconstruction error by the information loss is 
under development. In addition to that, new upper bounds 
on the information loss related to the reconstruction error of 
optimal (in the maximum a posteriori sense) and of simpler, 
sub-optimal estimators are analyzed. 

V. Functions with Infinite Information Loss 

We now drop the requirement of local bijectivity in Defini- 
tion |2] to analyze a wider class of surjective, Borel-measurable 
functions g: X ^ y. We keep the requirement that Px ^ M 
and thus X possesses a density /x (positive on X and zero 
elsewhere). We maintain 

Theorem 5. Let g: X ^ y be a Borel-measurable function 
and let the continuous RV X be the input to this function. If 
there exists a set B C y of positive Py-measure such that 
the preimage g~^[y] is uncountable for every y G B, then the 
information loss is infinite. 

Proof: We notice that since B (zy 

H{X\Y)^ f H{X\Y^y)dPy{y) (17) 

Jy 

> f i/(X|Y = y)dPY(y) (18) 

where the integrals are now written as Lebesgue integrals, 
since Py now not necessarily possesses a density. 

Since on B the preimage of every element is uncountable, 
we obtain with |4| and the references therein H{X\Y — y) = 
oo for all y e B, and, thus, H{X\Y) = oo. ■ 

Note that the requirement of B being a set of positive 
Pv-measure cannot be dropped, as Example 4 in Section |VI] 
illustrates. We immediately obtain the following 

Corollary 2. Let g: X ^ y be a Borel-measurable function 
and let the continuous RV X be the input to this function. If 
the probability measure of the output, Py, possesses a non- 
vanishing discrete component, the information loss is infinite. 



Proof: According to the Lebesgue-Radon-Nikodym the- 
orem ifTOl pp. 121] every measure can be decomposed in a 
component absolutely continuous w.rt. fi and a component 
singular to /i. The latter part can further be decomposed 
into a singular continuous and a discrete part, where the 
latter places positive Pv-mass on points. Let y* be such 
a point, i.e., Pviy*) > 0. As an immediate consequence, 
Px(g~^[y*]) > 0, which is only possible if g^^[y*] is 
uncountable (Px ^ A*)- ■ 

This result is also in accordance with intuition, as the 
analysis of a simple quantizer shows: While the entropy of 
the input RV is infinite (/(X; X) ^ oo for X ^ X; cf. El 
pp. 654]), the quantized output can contain only a finite 
amount of information (/(X; Y) — > H{Y) < oo). In addition 
to that, the preimage of each possible output value y is a set of 
positive Px-measure. The loss, as a consequence, is infinite. 

While for the quantizer the preimage of each possible output 
value is a set of positive measure, there certainly are functions 
for which some outputs have a countable preimage and some 
whose preimage is a non-null set. An example of such a system 
is the limiter |8, Ex. 5-4]. For such systems it can be shown 
that both the information loss L(X -j. Y) = i7(X|Y) and 
the information transfer /(X; Y) are infinite. 

Finally, there exist functions g for which the preimages of 
all output values y are null sets, but which still fulfill the 
conditions of Theorem |5] Functions which project X on a 
lower-dimensional subspace of fall into that category. 



VI. Examples 

In this Section we illustrate our theoretical results with the 
help of examples. The logarithm is taken to base 2 unless 
otherwise noted. 

A. Example 1: A two-dimensional transform with finite infor- 
mation loss 

Let X be uniformly distributed on the square X — [—a, a] x 
[—a, a]. Equivalently, the two constituing RVs Xi and X2 are 
independent and uniformly distributed on [—a, a]. In other 
words, while /x(x) = l/4a^ for all x e A:", we have 
fx{xi) = l/2a for Xi G [—a, a] and i — 1,2. 

We consider a function g performing the mapping: 



Y2 



(19) 
(20) 



The corresponding Jacobian matrix is a triangular matrix 



1 

sgn {xi - X2) sgn {x2 - Xi) 



(21) 



where sgn(-) is the sign-function. From this immediately 
follows that the magnitude of the determinant of the Jacobian 
matrix is unity for all possible values of X, i.e., |detj7g(x)| = 
1 for all x G A". The subsets of X on which the partitioned 
functions gi are bijective are no intervals in this case; they are 



X2 



a xi 



Fig. 2. Subdomains of Example 1. The partitioned functions restricted 
to the domain of either color ai'e bijective. Furthermore, the overall function 
g is bijective in areas with light shading. 



the triangular halves of the square induced by a;i 
Fig.© 
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The preimage of g(x) is, in any case, 

{[xi,a;2], [xi,2xi - X2]} H X. 



X2 (see 

(22) 
(23) 

(24) 



The transform g is bijective whenever [xi, 2xi~X2] ^ X, i.e., 
if \2xi — X2\ > a. 

With the PDF of X and of its components we obtain for 
the information loss 



^(X|Y) 



1 , f l^+fx{'2xi-X2) \ ^ 

log -T j dxidx2 

(25) 

which is non-zero only if —a < 2xi — X2 < a (numerator and 
denominator cancel otherwise; no loss occurs in the bijective 
domain of the function). As a consequence. 



iJ(X|Y) 



~ log 2 

1 , 1 
= 2- 



dxidx2 



(26) 
(27) 



The information loss is identical to a half bit. This is intuitive 
when looking at Fig. |2] where it can be seen that any infor- 
mation loss occurs only on one half of the domain X (shaded 
in stronger colors). By destroying the sign information, in this 
area the information loss is equal to one bit. 

B. Example 2: Squaring a Gaussian RV 

Let X be a zero-mean Gaussian RV with unit variance 
and differential entropy h{X) = i ln(27re) measured in nats. 
We consider the square of this RV, Y — g{X) = X^, 
to illustrate the connection between information loss and 



fx{x) 





Z. 



Fig. 3. PDF fx and piecewise linear function g of Example 3. 



differential entropy. The square of a Gaussian RV, Y , is x^- 
distributed with one degree of freedom. Thus, the differential 
entropy of Y is given by ifTTI 



In 



2r(i 



2^ l2 



11, 7 
2 + 2^°"-2 



(28) 



(29) 



where r(-) and ?/;(•) are the gamma- and digamma- 
functions It 12. Ch. 6] and 7 is the Euler-Mascheroni con- 
stant lfT2l pp. 3]. With some calculus we obtain for the 
expected value of the derivative (taking the place of the 
Jacobian determinant in the univariate case) 



E{ln|2a;|} = iln2 - -. 
^ ' 2 2 



(30) 



Subtracting differential entropies and adding the expected 
value of the derivative yields the information loss 

ii{x\Y) = - + E{iii|2x|} (31) 

= iln(27re)-i-ilnW + iln2 (32) 
= In 2 (33) 

again measured in nats. Changing the base of the logarithm to 
2 we obtain an information loss of one bit. This is in perfect 
accordance with a previous result showing that the information 
loss of a square-law device is equal to one bit if the PDF of 
the input has even symmetry |7|. 

C. Example 3: Exponential RV and infinite bounds 

In this example we consider an exponential input with PDF 

/x(x)=Ae-^^ (34) 

and a piecewise linear function 

LAxJ 



9{x) 



A 



(35) 



The PDF and the function are depicted in Fig. |3] 



We obviously have X = [0, 00) and y = [0) x)' while g 
partitions A' in a countable number of intervals of length 
In other words, 

\k-l k\ 

(36) 

and g{Xk) = y for all k = 1,2,.... From this follows that 
for every y ^y the preimage contains an element from each 
subdomain Xk', thus, the bounds from Theorem |4] all evaluate 
to H{X\Y) < 00. However, it can be shown that the other 
bound, H{X\Y) < H{W) is tight in this case: With 



fx{x)dx = (1 



(37) 



we obtain i/(I^) = - log(l-e-i)+ « 1.24. The same 
result is obtained for a direct evaluation of Theorem |2l 

D. Example 4: An almost invertible transform with zero in- 
formation loss 

As a next example consider a two-dimensional RV X which 
places probability mass uniformly on the unit disc, i.e., 

/x(x) = |.-' (38) 



0, 



where || • || is the Euclidean norm. Thus, A" ~ {x G M'^ : 
ll^ll < 1}- The cartesian coordinates x are now transformed 
to polar coordinates in a special way, namely: 



yi 



V2 = 



||x||, if||x||<l 
0, else 

arctan(f^) + 7r(l - sgn(a;i)), 



(39) 

if < ||x|| < 1 
else 

(40) 



This mapping together with the domains of X and Y is 
illustrated in Fig. |4] (left and upper right diagram). 

As a direct consequence we have y = (0, 1) x [0, 27r) U 
{0, 0}. Observe that not only the point x = {0, 0} is mapped 
to the point y = {0, 0}, but that also the unit circle 5 = {x : 
||x|| = 1} is mapped to y = {0,0}. As a consequence, the 
preimage of {0, 0} under g is uncountable. However, since a 
circle in is a Lebesgue null-set and thus Pyi{S) = 0, also 
Py{{0, 0}) = and the conditions of Theorem |5] are not met. 
Indeed, since _ff(X|Y = y) = Py-almost everywhere, it 
can be shown that i?(X|Y) = 0. 

E. Example 5: A mapping to a subspace of lower dimension- 
ality 

Consider again a uniform distribution on the unit disc, as 
it was used in Example 4. Now, however, let g be such that 
only the radius is computed while the angle is lost, i.e.. 



Vi 

y2 



(41) 
(42) 



Note that here only the origin {0, 0} is mapped bijectively, 
while for all other y G 3^ = [0, 1] x {0} the preimage under 
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Fig. 6. Information loss H(X|Y) of Example 6 for a = 2. 



Fig. 4. Mapping of domains in Examples 4 and 5. The solid red circle in the 
left diagram and the red dot in the upper right diagram correspond to each 
other, illustrating the mapping of an uncountable Px-nuU set to a point. The 
lightly shaded areas are mapped bijectively in Example 4. In Example 5, the 
disc in the left diagram is mapped to the solid red line in the lower right 
diagram. 
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Fig. 5. Subdomains of Example 6. The functions gi restricted to a domain 
of either color are bijective. Furthermore, the overall function g is bijective 
in areas with light shading. 



g is uncountable (a circle around the origin with radius yi). 
Indeed, in this particular example, the probability measure 
Py is not discrete, but singular continuous: Each point has 
zero Pv-measure (circles are Lebesgue null-sets), but Py is 
not absolutely continuous w.r.t. the two-dimensional Lebesgue 
measure /i. Clearly, fJ.{y) = while Py(3^) — 1- Since the 
preimage is uncountable on a set of positive PY-measure, we 
have H(K\Y) = oo. 

F. Example 6: Another two-dimensional transform with finite 
information loss 

Finally, consider a uniform distribution on a triangle defined 

by 

A" = {x e : Xi G [m — a^m + a],X2 G [—m — a, —Xi]} 

(43) 



where < m < a and a > 0. Thus, the PDF of X is given 
as /x(x) = 2^ if x € X and zero elsewhere (see Fig. |5]). 
The function g takes the magnitude of each coordinate, i.e.. 
Hi = \xi\, where i = 1,2. We now try to derive the information 
loss as a function of m. 

First we can identify three subsets of X which are mapped 
bijectively by restricting g to these sets, namely Xi — {x € 
A" : < 0,a;2 > 0}, -^2 = {x e A:" : < 0,2:2 < 0}, and 
X3 — {x E X : xi > Q,X2 < 0}. Furthermore, for to > 
a part of X3 is mapped bijectively by g (lighter shading in 
Fig. |5]). The probability mass contained in this subset Xi, can 
be shown to equal P^ ~ Px(<^b) = For all other possible 
input values x the preimage of g(x) has exactly two elements: 
One of them is located in X2, the other either in Xi or in 
X3 \ Xb. Due to the uniformity of X and since the Jacobian 
determinant is identical to unity for all x e A" both of these 
preimages are equally likely. Thus, on X\Xi, the information 
loss is identical to one bit. In other words. 



i/(X|Y = y) = 1 



(44) 



for all y e g(A' \ Xb). We therefore obtain with P^ = ^ an 
information loss equal to H(X|Y) = 1 — 

From the probability masses contained in the sets Xi, X2, 
and A3 we can compute an upper bound on the information 
loss: 



TO 

— Ice 



And evaluating the bounds of Theorem |4] yields 

fn Tfi 

H[-K\Y) < 1 - ^ < log(2 - ^) < 1 



(45) 



(46) 



which for m = all reduce to one bit. In particular, it can 
be seen that in this case the smallest bound of Theorem |4] is 
exact. 

The exact information loss, together with the second small- 
est bound from Theorem |4] and with the bound from H{W), 
is shown in Fig. |6] As it can be seen, the closer the param- 
eter TO approaches a, the smaller the information loss gets. 
Conversely, for to = the information loss is exactly one bit. 
Moreover, it turns out that the bound from H{W) is rather 
loose in this case. 



VII. Conclusion 

In this work, we proposed a mathematically concise def- 
inition of information loss for the purpose of establishing a 
system theory from an information-theoretic point of view. 
For a certain class of multivariate, vector-valued functions and 
continuous input variables this information loss was quantified, 
and the result is accompanied by convenient upper bounds. We 
further showed a connection between information loss and the 
differential entropies of the input and output variables. 

Finally, a class of systems has been identified for which 
the information loss is infinite. Vector-quantizers and limiters 
belong to that class, but also functions which project the input 
space onto a space of lower dimensionality. 

Appendix 
Proof of Theorem |2] 

For the proof we use (|2|i of Theorem |2] where we take 
the limit of a sequence of increasingly fine partitions Vn = 
|^^")| satisfying For a given n we write the resulting 
mutual information /(X„; X) as 

/(X„;X)-e{i?(/x|x„(-,x)||/x(-))} (47) 

where i'l H ) denotes the Kullback-Leibler divergence and the 
expection is w.rt. X„. Note that for each possible outcome 
of X„ the conditional probability measure Px|xfc is absolutely 
continuous w.r.t. the Lebesgue measure (cf. Section |V]i. It thus 
possesses a density 



this yields 



r /x(x) 



0, 



if X e A"^ 

else 



(") 



(48) 



where p(xfc) = Px('^i"^)- With the definition of the 
Kullback-Leibler divergence ['3' Lemma 5.2.3] and [8" Thm. 5- 
1] we can write the difference of mutual informations in 
Theorem [T] as 



/(X„;X)-/(X„;Y) = 

/x(x) //x|x„(X'*fe)/Y(g(x))^ 



Xfe, 



X. 



(") p(ife) \ /Y|x„(g(x),ifc)/x(x) 



dx.. 



Rewriting with the indicator function 



1a{x) 



0, else 



(49) 



(50) 



/(X„;X)-/(X„;Y) = 

//x(x)i:(v.(x).og(^ 



/xlX. (x,Xfc)/Y(g(x)) ^ 

(g(x),Xfc)/x(x) 



dx. 



We can now exploit the relationship dHJ for the conditional 
PDF of Y given X„, and with ( |48] ) we realize that the function 
under the integral is monotonically increasing in n: Indeed, for 
finer partitions it is less likely that any element of the preimage 
g~^[g(x)] other than x lies in A"^^""*, thus /y|x (g(x),Xfe) 

converges to ^^^^j ^^y^ ■ This holds for all fc, thus, invoking 
the monotone converge theorem [10. pp. 21] and cancelling 
the conditional PDFs eUminates the dependence on k and the 
sum over indicator functions (IJj, X^^^ = X). Substituting the 
PDF of Y with (|8]l completes the proof. ■ 
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